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5 CROSS-REFERENCES TO RELATED APPLICATION(S) 

[01] The present application claims the henefit of priority under 35 U.S.C. § 1 19 

from U.S. Provisional Patent Application Serial No. 60/273,816, entitled "METHOD AND 
SYSTEM FOR REAL-TIME QUERYING, RETRIEVAL AND INTEGRATION OF DATA 
u FROM DATABASES OVER A COMPUTER NETWORK" filed on March 6, 200 1 , the 
H disclosure of which is hereby incorporated by reference in its entirety for all purposes, 
y i 

|: BACKGROUND OF THE INVENTION 

fU [02] The present invention generally relates to data retrieval. More specifically, the 

present invention relates to a method and system for retrieving and integrating data from one 
j|5 or more databases over a computer network. 

jjy [03] As business-to-business (B2B) technology becomes more widespread, a 

p number of companies have implemented B2B platforms, along the way defining protocols 
1 y which allow for automated standardized interactions between some of their business partners. 

Most often, these protocols are designed to model the paper-based processes - orders, 
20 invoices, etc. - in an effort to more efficiently execute such processes and reduce the 

associated costs. The business objective was to lower operating costs. 

[04] The evolution of the Internet as a communication vehicle between businesses 

has allowed many companies to utilize business-to-business software platforms to link simple 

business processes and transactions with each other - orders, invoices, etc. However, this has 
25 still not enabled businesses within a value chain to truly collaborate and share information to 

allow business partners to make intelligent decisions about when, where and how to conduct 

these transactions. 

[05] B2B transaction automation, outside the firewall, tracks a similar pattern to the 

internal transaction automation companies implemented in the 1980s, and the e-commerce 
30 transaction automation implemented in the late 1990s. Different technologies were used — 
CICS with COBOL for internal business transactions, and e-commerce servers with Java for 



e-cornmerce. The same result occurred. Standard transactions and processes were captured 
and automated to save execution costs. 

[06] Simple transaction execution provides the first level of automation but does 

not wring all the costs out of the business processes involved. History has shown that once 
5 simple transactions were defined, the business problems evolved to require more complex 
decision-making and intelligence. 

[07] Today's computer networking environments and technologies, such as the 

electronic data interchange (EDI), email and ftp, are conventionally used by enterprises to 
share information across a business supply chain for forecasting, planning and execution. 
1 0 However, when information must be collected and generated in short periods of time such as 
K* on an hourly basis or even real time, these technologies often perform below expectations. 
Q [08] Various systems were introduced in an attempt to improve the above- 

%l mentioned situation. For example, one system was introduced to solve planning issues such 

y i 

§ as retailer forecasting and inventory management by connecting retailers' and suppliers' 

\ y 

PJ5 resources through direct connection of their computers through networking. The forecasting 
L> is calculated by doing a serious of reviews on an order in a basically single-to-single 
N; enterprise basis. 

hi [09] Another example is a system that allows data access outside a single 

bt enterprise. The interchange layer of the system allows the system to see the whole supply 
20 chain instead of a single enterprise, which is mostly beneficial to supply chain planning. 
Data from the supply chain is collected and stored on the database. The data is then 
processed using certain parameters and processes to provide supply information for supply 
chain planning. Basically, using certain parameters such as manufacturing capacity, ERP and 
financial support, a forecasting module can be set up to evaluate information that is necessary 
25 for supply chain planning. Data is collected and then computed before it is accessible for 
planning. The system is designed to shorten the time to collect and compute a large variety 
and amount of data. It is useful for forecasting and planning purpose. However, the system 
only allows for access of computed data that can be computed with a certain amount of time 
delay. It cannot provide up-to-date and accurate data to support the supply chain real time 
30 decision-making. When a specific piece of data is required by the user that is out of the 
syllabus of the forecasting model, the computed data also lacks flexibility to fulfill the 
requirement. 
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[10] Therefore, it would be desirable to provide a method and system which is 

capable of querying, retrieving and integrating data from databases over a computer network 
on a real-time basis in a more efficient manner. 

5 SUMMARY OF THE INVENTION 

[11] A method and system for retrieving and integrating data from multiple 

databases over a computer network is provided. An exemplary embodiment of the present 
invention includes a system having an agregation server and a number of agents. The 
aggregation server is capable of communicating with the agents via a computer network such 
1 0 as the Internet. Each agent is designed to communicate locally with a number of data 
H- sources. A user is able to retrieve data from the data sources by contacting the aggregation 
O server which, in turn, causes the appropriate agents to retrieve the requested data from the 
relevant data sources. 

[12] According to an exemplary embodiment, when the user issues a request to the 

aggregation server to retrieve certain data, the request is converted into an internal query by 
the aggregation server. The internal query is then matched against a set of rules. Each rule 
specifies how an internal query is to be partially satisfied using one or more data sources. For 
a set of rules matching the internal query, a subquery is generated. All the generated 
subqueries of the internal query are then used by their respective agents to retrieve the 
20 requested data. Optionally, all the generated subqueries are optimized to effect more efficient 
retrieval of data from the respective data sources. When the requested data is returned from 
all the relevant agents, the requested data is then joined, fused and unioned to produce the 
final results representing the collective data responsive to the internal query. Star union is 
used to optimize the union and join operations. 
25 [13] Reference to the remaining portions of the specification, including the 

drawings and claims, will realize other features and advantages of the present invention. 
Further features and advantages of the present invention, as well as the structure and 
operation of various embodiments of the present invention, are described in detail below with 
respect to accompanying drawings, like reference numbers indicate identical or functionally 
30 similar elements. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[14] Fig. 1 is a simplified block diagram illustrating an exemplary embodiment of 

the present invention; 
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[15] Fig. 2 is a flow diagram illustrating the data integration process performed by 

an exemplary embodiment of the present invention; 

[16] Fig. 3 is an illustrative example of an input query request in accordance with 

an exemplary embodiment of the present invention; 
5 [17] Fig. 4 is an illustrative example of a query definition file in accordance with 

an exemplary embodiment of the present invention; 

[18] Fig. 5 is an illustrative example of a mediator specification file in accordance 

with an exemplary embodiment of the present invention; 

[19] Fig. 6 is an illustrative example of an agent capability file in accordance with 

10 an exemplary embodiment of the present invention; and 

jM, [20] Figs. 7a and 7b are illustrative examples of a query mapping file in accordance 

with an exemplary embodiment of the present invention. 

Ul 

m 

jp DETAILED DESCRIPTION OF THE INVENTION 

Si> [21] The present invention in the form of one or more exemplary embodiments will 

^ now be described. Fig. 1 is a simplified block diagram illustrating an exemplary embodiment 
Lh of the present invention. Referring to Fig. 1, there is a shown a system 10 representing an 
j exemplary embodiment of the present invention. The system 1 0 includes an aggregation 
O server 12, a number of agents 14, and a number of data sources 16. The data sources 16 
"20 include, for example, databases and applications which is capable of supplying data. The 
data sources 16 are generally organized into groups based on one or more predetermined 
criteria. For example, data sources 16a-c may be grouped together because these data sources 
16a-c reside on a single computer system belonging to the same company. However, it 
should be understood that the data sources 16 need not reside on a single computer system. A 
25 person of ordinary skill in the art will know of other ways to organize a group of data 
sources. Moreover, each of the data sources 16 within a group may be different. For 
instance, within a group, one data source may be a database manufactured by one vendor, 
such as, IBM, and another data source may be a database from a second vendor, such as, 
Oracle. Each agent is designed to communicate with a specific group of data sources 16 to 
30 retrieve and integrate the desired or requested data. 

[22] The system 10 generally operates in the following exemplary manner. When a 

user 18 wishes to retrieve certain data, the user 1 8 issues a request to the aggregation server 
12. In an exemplary embodiment, the user 18 uses a graphical user interface on a computer 
to relay the request to the aggregation server 12 via a computer network 20a, such as, the 
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Internet. The request is encoded in XML format for delivery from the user 1 8 to the 
aggregation server 12. In an alternative embodiment, the user 18 may interact directly with 
the aggregation server 12 without going through any computer network. 
[23] Upon receiving the request, the aggregation server 12 processes the request 

5 and determines which one or more of the agents 14 have access to the requested data. Upon 
identifying these agents 14, the aggregation server 12 communicates with these agents 14 via 
a computer network 20b to retrieve the requested data. This computer network 20b can also 
be, for example, the Internet. Hence, the computer networks 20a,b may or may not be the 
same. 

1 0 [24] Each identified agent 14 further processes the request received from the 

u : aggregation server 12 and accesses the appropriate data sources 16 to retrieve the requested 
S data. The agent 14 then integrates the retrieved data and forwards it to the aggregation server 

yi 12. The integrated data can be formatted in XML or SOAP and then forwarded to the 

01 

45 aggregation server 12 via the computer network 20b using a number of transfer protocols 

jp including, for example, HTTP. Based on the disclosure provided herein, a person of ordinary 

« skill in the art will know of other formats and transfer protocols which can be used to 

y ( implement the data transfer between the agent 14 and the aggregation server 12. 
I [25] Upon receiving the retrieved data from all the relevant agents 14, the 

O aggregation server 12 then integrates all the retrieved data and presents it to the user 18. 

Hi 

20 Details with respect to how each agent 14 and the aggregation server 12 retrieves and 
integrates the requested data will be described further below. 

[26] The operation of the system 10 is further illustrated in a more practical context 

as follows. In one exemplary configuration, an agent 14 resides on the private computer 
network of a company and is able to communicate locally with that company's internal data 

25 sources, such as, databases and applications. When a customer of the company wishes to 
obtain certain information relating to, for example, his/her purchase order, the customer 
issues a request to the aggregation server 12. The request is processed and then relayed by 
the aggregation server 12 to the agent 14 via, for example, the Intranet. The agent 14, in turn, 
retrieves the requested information from the company's data sources and then integrates the 

30 retrieved information for delivery to the aggregation server 12 which subsequently forwards 
the information to the customer. 

[27] Fig. 2 is a flow diagram illustrating in further detail how a request issued by 

the user 18 is processed so as to cause one or more agents 14 to retrieve and integrate the 
requested data. Referring to Fig. 2, the user 18 uses a request form or a graphical user 
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interface to enter the request for data. The request form contains a number of different fields. 
Different request forms may be available to the user 18 to request different types of data. In 
an exemplary embodiment, the request form (and the information contained therein) is 
converted into an input query request encoded in XML format for delivery to the aggregation 
5 server 12. Fig. 3 shows an illustrative example of the input query request. 

[28] Upon receiving the input query request from the user 1 8, the aggregation 

server 12 breaks down or converts the input query request into an internal query. In 
particular, for each input query request , there is a corresponding request template that can be 
instantiated to the internal query. The internal query is represented by a query definition file. 
10 The query definition file has two parts, namely, a head portion and a tail portion. The head 
H portion represents the query output format, i.e., it describes what data structure is going to be 
q displayed and the way data fusion is going to take place when data responsive to the internal 
ji; query is retrieved. The tail portion represents the query input formats, i.e., it specifies what 
4* kind of data is to be retrieved and the requisite input arguments or parameters needed to 
jt1> retrieve such data. The tail portion is made up of a conjunctive set of query input formats. 
!L. Fig. 4 provides an illustrative example of the query definition file. The purpose and use of 
y* the query definition file will be further described below. 

i rj [29] Once the internal query (and the corresponding query definition file) is 

O created, the internal query is evaluated against a set of rules. This set of rules specifies where 
20 and how different internal queries are to be satisfied. For example, one rule may specify that 
a specific internal query can be satisfied by a first and a second data source; another rule may 
specify that this same specific internal query can also be satisfied by a third and a fourth data 
source. More generally, a rule can specify how a subset of the conjunctive set of query input 
formats in the tail portion of the internal query can be satisfied. A combined set of rules can 
25 then be used to specify how the entire internal query can be satisfied. The set of rules are 
designed based on the structures, constraints and contents of the data sources. In an 
exemplary embodiment, this set of rules is stored in a mediator specification file residing on 
the aggregation server 12. Each rule in the mediator specification file also includes a head 
portion. Similar to the tail portion of the query definition file, the head portions of the set of 
30 rules in the mediator specification file also represent the query input formats, i.e., it specifies 
what kind of data is to be retrieved and the requisite input arguments or parameters needed to 
retrieve such data. The use of this similar functionality will be further explained below. Fig. 
5 provides an illustrative example of the mediator specification file. 
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[30] Referring back to Fig. 2, the internal query is evaluated against each rule 

within the set of rules in the mediator specification file. More specifically, the corresponding 
query definition file for the internal query is examined to determine if the tail portion of the 
query definition file matches a set of the head portions of rules within the mediator 
5 specification file. In other words, if each of the input query formats of the tail portion of the 
query definition file is matched with the head portion of one rule within the set of rules, then 
that set of rules is considered to be a match with the internal query. It should be noted that 
for a rule to be considered a match with a subset of input query formats of the tail portion of 
the query definition file, the subset of input query formats of the tail portion of the query 
1 0 definition file does not have to be identical to the head portion of the rule in the mediator 
N« specification file. It is sufficient to be considered a match if the subset of input query formats 
f2 of the tail portion of the query definition file is equal to part or all of the head portion of the 

y j rule. In other words, the head portion of the rule may be a superset of the tail portion of the 

y i 

..=p query definition file; conversely, the tail portion of the query definition file may be a superset 

jf f> of the head portion of the rule. 

^ [31] For each set of matched rules, the aggregation server 12 generates a subquery. 

M, For each internal query, one or more subqueries may be generated. Each subquery identifies 
\ the data sources that are used to supply data which are responsive to the internal query as 

Q well as the agent 14 that is designed to access those identified data sources. Optionally, as 

nJ 

20 will be further described below, the aggregation server 12 analyzes the subqueries and 

formulates a query execution plan to optimize the execution of the subqueries by the relevant 
agents 14. 

[32] For each subquery, the aggregation server 12 determines if the subquery can 

be executed with a set of agents. The set of agents may include one or more agents. All the 

25 relevant agents 14 are then identified from the subqueries. Each agent has a corresponding 
agent capability file. Fig. 6 provides an illustrative example of an agent capability file. The 
corresponding agent capability file for each relevant agent is checked to determine if that 
particular agent is capable of returning the data specified by the associated subquery, A 
particular agent may not be able to participate in executing the associated subquery due to a 

30 number of factors. 

[33] If it is determined that a relevant agent is capable of participating in executing 

the associated subquery, the associated subquery is then formatted appropriately into an agent 
request and transmitted to the relevant agents for execution. In an exemplary embodiment, 
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the aggregation server 12 encodes the associated subquery into the agent requests in XML 
format and forwards it to the relevant agents via the Internet. 

[34] Upon receiving the agent requests which embody the subquery, each of the 

relevant agents then identifies a query mapping file which corresponds to the received agent 
5 request. The query mapping file is used to map information between data in a desired format 
and native data retrieved from the data sources pursuant to the subquery. Furthermore, the 
query mapping file also includes information on how to connect to a data source thereby 
allowing the relevant agent to access the data source. For example, one data source to be 
accessed may be a database and another data source may be an application which 
10 communicates via an application programming interface. Figs. 7a and 7b provide illustrative 

examples of the query mapping file. 
Q [35] For each subquery embodied in a set of agent requests, i.e., at the agent level, 

jjj data which is responsive to the subquery is retrieved by the set of relevant agents from the 
01 relevant data sources. Each agent then performs join operations on the retrieved data and 
ft0 encodes the joined data in the appropriate format for delivery to the aggregation server 12. In 

an exemplary embodiment, the joined data is encoded in the XML format, 
p [36] Upon receiving the data from the relevant agents which executed the 

pj subqueries, the aggregation server 12 performs join, fusion and union operations on all the 
received data. The fusion operation aggregates data from different agents based on a set of 
020 attribute values defined in the head portion of the query definition file. The union operation 
unions together all aggregated data returned by the relevant agents. 

[37] It should be noted that the volume of data which is responsive to the internal 

query may be quite high and, due to system constraints and/or other requirements, all the 
responsive data may not be retrieved by the relevant agents or processed by the aggregation 

25 server 12 all at once. Therefore, the amount of responsive data which is to be retrieved by the 
relevant agents and/or processed by the aggregation server 12 at any one time is configurable. 
[38] As mentioned above, the aggregation server 12 can also analyze the 

subqueries and formulate a query execution plan to optimize the execution of the subqueries 
by the relevant agents 14. Take, for example, an internal query that has three sets of matched 

30 rules thereby generating three subqueries. The three subqueries are the same except that they 
each require access to different combinations of data sources. For instance, the first subquery 
specifies that data source A and data source B are to be accessed; the second subquery 
specifies that data source A and data source C are to be accessed; and the third subquery 
specifies that data source A and data source D are to be accessed. Without any optimization, 
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the agent requests corresponding to the subqueries are executed by their respective agents 
independently. Consequently, data source A is accessed three times. 
[39] Optionally, the aggregation server 12 may optimize the execution of the 

subqueries as follows. First, the common data source shared by the subqueries is identified. 
5 A common key shared by all the data sources accessed by the subqueries is also identified. 
Then, the first subquery is executed against the common data source to retrieve, amongst 
other data, a list of possible values for the common key. The list of retrieved values for the 
common key is then concurrently passed to the relevant agents to be used to access the other 
data sources pursuant to the subqueries. The results retrieved pursuant to the subqueries are 

10 then unioned together to generate the final, appropriate results. This overall union operation 

y* is called star union. 

q [40] The foregoing optimization process is illustrated using the example given 

U * above. Furthermore, assume in the example that data source A is used to store information 

m 

J!; on component parts and their respective descriptions. The information is indexed or keyed 

fi i 

jf|5 by one field, part number. Data sources B, C and D are used to store information on 

5 _ component parts and their respective quantities for suppliers B, C and D. The information is 

indexed or keyed by one field, part number. Using the foregoing optimization process as 

{ f! described above, the common data source shared by the three subsqueries is data source A. 

Mr.; 

C3 The common key shared by data sources A, B, C and D is the part number field. Using this 

ru 

20 information, the first subquery is executed against data source A and a list of values indexed 
by the part number field is retrieved. This list of values represents the part number 
information requested by the first subquery. This list of values is then used to retrieve the 
relevant information from each of the remaining data sources B, C and D pursuant to the 
respective subqueries. The data retrieval from the data sources B, C and D can be done in a 

25 concurrent manner. Pursuant to the first subquery (which specifies access to data sources A 
and B), the results retrieved from data source A are then joined with results retrieved from 
data source B. The joined results represent information related to selected component parts, 
their respective descriptions and quantities which are available from supplier B. Similarly, 
pursuant to the second subquery (which specifies access to data sources A and C), the results 

30 retrieved from data source A are then joined with results retrieved from data source C. The 
joined results represent information related to selected component parts, their respective 
descriptions and quantities which are available from supplier C. Likewise, pursuant to the 
third subquery (which specifies access to data sources A and D), the results retrieved from 
data source A are then joined with results retrieved from data source D. The joined results 
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represent information related to selected component parts, their respective descriptions and 
quantities which are available from supplier D. All the respective joined results are then 
fused together so that for each part number, the data retrieved from data sources B, C and D 
are aggregated together in the final results which satisfy the internal query. 

5 [41] In an exemplary embodiment, the present invention is implemented in the 

form of control logic in either a modular or integrated manner using software. Based on the 
disclosure provided herein, however, it will be appreciated by a person of ordinary skill in the 
art that the present invention can also implemented using other methods and/or techniques, 
such as, hardware implementation and a combination of software and hardware 

10 implementation. 

[42] It is understood that the examples and embodiments described herein are for 

n illustrative purposes only and that various modifications or changes in light thereof will be 
S suggested to persons skilled in the art and are to be included within the spirit and purview of 
ft this application and scope of the appended claims. All publications, patents, and patent 
ffj 5 applications cited herein are hereby incorporated by reference for all purposes in their 
m entirety. 
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